A Novel Fuzzy Chinese Address Matching Engine Based on Full-text Search Technology

نویسندگان

  • Xiaojing Yao
  • Xiang Li
  • Ling Peng
  • Tianhe Chi
چکیده

The ability to locate addresses is one of the most important features in an urban geographic information system. Since Chinese geocoding problem cannot be handled by the European geocoding method, some Chinese scholars did special researches on Chinese geocoding. Current researches all focus on address standardizations and models, and pay less attention to the user input and result control. We designed a novel fuzzy Chinese address matching engine to give a freedom of user input and result control. The engine is composed of an index builder and a retrieval locator based on full-text search. Furthermore, three kinds of fuzzy match methods (Searching-Box Fuzzy Single Match, One-to-One Fuzzy Single Match, and Table-Form Fuzzy Batch Match) are implemented. Through more than 50,000 pieces of address-point data from eight districts of Beijing for testing, this engine compared to traditional database retrieval system shows obvious advantages: (1)It has higher match efficiency for dealing with large data; (2)It offers greater freedom on user input and result control. In the quality test, when the threshold of fuzzy matching degree is higher than 0.75, the accuracy rate reaches to 100%, with matching rate 83% and recall rate 92% respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine

Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...

متن کامل

Application of Full Text Search Engine Based on Lucene

This paper introduces us the full-text search engine based on Lucene and full-text retrieval technology, including indexing and system architecture, compares the full-text search of Lucene with the String search retrieval’s response time, the experimental results show that the full text search of Lucene has faster retrieval speed.

متن کامل

WWW Search Systems Using SQL*TextRetrieval and Parallel Server for Structured and Unstructured Data

We describe our experience in developing Web Search Systems using Oracle’s SQL*TextRetrieval. In the prototype system we store on-line books in the HTML and the HTML documents of a web site, SQL*TextRetrieval is used to index full text and other structured data in the ’web space’ and to provide an efficient search engine for free-text search. The Web enables global access to and maximum informa...

متن کامل

SEARCH ENGINE IN LARGE - SCALE PEER - TO - PEER SYSTEMS by AKSHAY LAL

LAL, AKSHAY. Dgoogle: A Full-Text Search Engine in Large-Scale Peer-to-Peer Systems. (Under the direction of Professor Khaled Harfoush). Full-text search engines like Google serve an important role in accessing Internet resources. In such engines, a search for web pages, matching a user’ s query, are typically carried on a set of co-administered, physically co-located clusters of servers. Full-...

متن کامل

A Web-based Kernel Function for Matching Short Text Snippets

Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if any, terms in common between two short text snippets. We address this problem by introducing a novel method for measuring the similarity between short text snippets (even those without any overlapping terms) by levera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015